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Abstract 



Gimbert and Horn gave an algorithm for solving simple stochastic games with running time 
0{r\n) where n is the number of positions of the simple stochastic game and r is the number of 
its coin toss positions. Chatterjee et al. pointed out that a variant of strategy iteration can be 
implemented to solve this problem in time In this paper, we show that an algorithm 

I combining value iteration with retrograde analysis achieves a time bound of 0{r2'^ {rlogr + 

• n)), thus improving both time bounds. While the algorithm is simple, the analysis leading to 

I this time bound is involved, using techniques of extremal combinatorics to identify worst case 

O ■ instances for the algorithm. 

^ ■ 1 Introduction 
in" 

■ Simple stochastic games is a class of two-player zero-sum games played on graphs that was intro- 

ly-^ ■ duced to the algorithms and complexity community by Condon [6] . A simple stochastic game is 

. given by a directed finite (multi-)graph G = {V,E), with the set of vertices V also called positions 

and the set of arcs E also called actions. There is a partition of the positions into Vi (positions 
belonging to player Max), V2 (positions belonging to player Min), Vr (coin toss positions), and a 
special terminal position GOAL. Positions of V2, Vr have exactly two outgoing arcs, while the 
terminal position GOAL has none. We shall use r to denote | Vr| (the number of coin toss positions) 
^ ■ and n to denote |V| — 1 (the number of non-terminal positions) throughout the paper. Between 

moves, a pebble is resting at one of the positions k. If k belongs to a player, this player should 
strategically pick an outgoing arc from k and move the pebble along this arc to another node. If 
A; is a position in Vr, Nature picks an outgoing arc from k uniformly at random and moves the 
pebble along this arc. The objective of the game for player Max is to reach GOAL and should play 
so as to maximize his probability of doing so. The objective for player Min is to minimize player 
Max's probability of reaching GOAL. 

A strategy for a simple stochastic game is a (possibly randomized) procedure for selecting which 
arc or action to take, given the history of the play so far. A positional strategy is the very special 
case of this where the choice is deterministic and only depends on the current position, i.e., a 
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positional strategy is simply a map from positions to actions. If player Max plays using strategy x 
and player Min plays using strategy y, and the play starts in position k, a random play p{x, y, k) 
of the game is induced. We let u{x, y) denote the probability that player Max will reach GOAL in 
this random play. A strategy x* for player Max is said to be optimal if for all positions k it holds 
that 

inf u^{x*,y) > sup inf u^{x,y), (1) 

where Si {S2) is the set of strategies for player Max (Min). Similarly, a strategy y* for player Min 
is said to be optimal if 

svop u^{x,y*) < inf supn'^(x,y). (2) 

A general theorem of Liggett and Lippman ([12J, fixing a bug of a proof of Gillette j9j) restricted 
to simple stochastic games, implies that: 

• Optimal positional strategies x*,y* for both players exist. 

• For such optimal x*,y* and for all positions k, 

min ii'^(x*, y) = maxti'^(x, y*). 
y&S2 x£Si 

This number is called the value of position k. We shall denote it val(G)fc and the vectors of 
values val(G). 

In this paper, we consider quantitatively solving simple stochastic games, by which we mean com- 
puting the values of all positions of the game, given an explicit representation of G. Once a simple 
stochastic game has been quantitatively solved, optimal strategies for both players can be found 
in linear time [2]. However, it was pointed out by Anne Condon twenty years ago that no worst 
case polynomial time algorithm for quantitatively solving simple stochastic games is known. By 
now, finding such an algorithm is a celebrated open problem. Gimbert and Horn [lOj pointed out 
that the problem of solving simple stochastic games parametrized by r = \Vji\ is fixed parameter 
tractable. That is, simple stochastic games with "few" coin toss positions can be solved efficiently. 
The algorithm of Gimbert and Horn runs in time r\n'^^^\ The next natural step in this direction is 
to try to find an algorithm with a better dependence on the parameter r. Thus, Dai and Ge |8] gave 
a randomized algorithm with expected running time \/r^.n^^^\ Chatterjee et al. [4] pointed out that 
a variant of the standard algorithm of strategy iteration devised earlier by the same authors [5] can 
be applied to find a solution in time '^r^'^^^rp'^^^ (they only state a time bound of 2^'^'^'^nP^^\ but 
a slightly more careful analysis yields the stated bound). The dependence on n in this bound is at 
least quadratic. The main result of this paper is an algorithm running in time 0(r2''(r logr + n)), 
thus improving all of the above bounds. More precisely, we show: 

Theorem 1 Assuming unit cost arithmetic on numbers of bit length up to 0(r), simple stochastic 
games with n positions out of which r are coin toss positions, can be quantitatively solved in time 
0{r2^{r log r + n)). 

The algorithm is based on combining a variant of value iteration |151 [7] with retrograde analysis 
[21 [1]. We should emphasize that the time bound of Theorem [1] is valid only for simple stochastic 
games as originally defined by Condon. The algorithm of Gimbert and Horn (and also the algorithm 
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Function SolveSSG(G) 



7;^(1,0,..,0); 

for i G {1,2, . . . ,2(ln25"^^^M)+i) • do 
V ^ SolveDGG(G, u); 

v' ^ v; 

f^k ^ i'Vj + v'()/2^ for all /c G Vr, Vj and being the two successors of v^', 
Round each value Vk down to 7r binary digits; 

V ^ SolveDGG(G, w); 

V KwekMehlhorn(T;, 4''); 

return v 

Figure 1: Algorithm for solving simple stochastic games 

of Dai and Ge, though this is not stated in their paper) actually applies to a generalized version 
of simple stochastic games where coin toss positions are replaced with chance positions that are 
allowed arbitrary out-degree and where a not-necessarily-uniform distribution is associated to the 
outgoing arcs. The complexity of their algorithm for this more general case is 0(r!(|£^| where 
p is the maximum bit-length of a transition probability (they only claim 0{r\{n\E\ but by 

using retrograde analysis in their Proposition 1, the time is reduced by a factor of n). The algorithm 
of Dai and Ge has analogous expected complexity, with the r! factor replaced with ^frl. While our 
algorithm and the strategy improvement algorithm of Chatterjee et al. can be generalized to also 
work for these generalized simple stochastic games, the dependence on the parameter p would be 
much worse - in fact exponential in p. It is an interesting open problem to get an algorithm with a 
complexity polynomial in 2'" as well as p, thereby combining the desirable features of the algorithms 
based on strategy iteration and value iteration with the features of the algorithm of Gimbert and 
Horn. 

1.1 Organization of paper 

In Section [2] we present the algorithm and show how the key to its analysis is to give upper bounds 
on the difference between the value of a given simple stochastic game and the value of a time 
hounded version of the same game. In Section O we then prove such upper bounds. In fact, we 
offer two such upper bounds: One bound with a relatively direct proof, leading to a variant of 
our algorithm with time complexity 0(r^2''(r + nlogn)) and an optimal bound on the difference 
in value, shown using techniques from extremal combinatorics, leading to an algorithm with time 
complexity 0{r2'^{r + nlogn)). In the Conclusion section, we briefly sketch how our technique 
also yields an improved upper bound on the time complexity of the strategy iteration algorithm of 
Chatterjee et al. 

2 The algorithm 

2.1 Description of the algorithm 

Our algorithm for solving simple stochastic games with few coin toss positions is the algorithm of 
Figure [H 
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Procedure ModifiedValuelteration(G) 

v^{l,0,...,0); 
while true do 

V ^ SolveDGG(G, w); 

v' ^ v; 

Vk ^ {v'j + v[)/2^ for all k G Vr, Vj and vi being the two successors of Vk] 



Figure 2: Modified value iteration 

In this algorithm, the vectors v and v' are real-valued vectors indexed by the positions of G. 
We assume the GOAL position has the index 0, so f = (1, 0, 0) is the vector that assigns 1 to the 
GOAL position and to all other positions. SolveDGG is the retrograde analysis based algorithm 
from Proposition 1 in Andersson et al. plj for solving deterministic graphical games. Deterministic 
graphical games are defined in a similar way as simple stochastic games, but they do not have coin 
toss positions, and arbitrary real payoffs are allowed at terminals. The notation SolveDGG(G, w') 
means solving the deterministic graphical game obtained by replacing each coin toss position k of 
G with a terminal with payoff v'f^, and returning the value vector of this deterministic graphical 
game. Finally, KwekMehlhorn is the algorithm of Kwek and Mehlhorn [llj. KwekMehlhorn(f , g) 
returns a vector where each entry Vi in the vector v is replaced with the smallest fraction a/b with 
a/b > Vi and b < q. 

The complexity analysis of the algorithm is straightforward, given the analyses of the proce- 
dures SolveDGG and KwekMehlhorn from [HIH]. There are 0(r2^) iterations, each requiring time 
0{rlogr + n) for solving the deterministic graphical game. Finally, the Kwek- Mehlhorn algorithm 
requires time 0{r) for each replacement, and there are only r replacements to be made, as there are 
only r different entries different from 1, in the vector v, corresponding to the r coin toss positions, 
by standard properties of deterministic graphical games [1]. 

2.2 Proof of correctness of the algorithm 

We analyse our main algorithm by first analysing properties of a simpler non-terminating algorithm, 
depicted in Figure O We shall refer to this algorithm as modified value iteration. 

Let be the content of the vector v immediately after executing SolveDGG in the (t -|- l)'st 
iteration of the loop of Modif iedValuelteration on input G. To understand this variant of value 
iteration, we may observe that the vectors can be given a "semantics" in terms of the value of a 
time bounded game. 

Definition 2 Consider the "timed modification" G* of the game G defined as follows. The game 
is played as G, except that play stops and player Max loses when the play has encountered t + 1 
(not necessarily distinct) coin toss positions. We let val(G*)fc be the value of G^ when play starts 
in position k. 

Lemma 3 \/k,t : v\. = val(G*)fc. 

Proof Straightforward induction in t ("backwards induction"). □ 

From the semantics offered by Lemma [3] we immediately have V/i;,t : val(G*)fc < val(G*'''^)fc. 
Futhermore, it is true that limf_j.oo val(G*) = val(G), where val(G) is the value vector of G. This 
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latter statement is very intuitive, given Lemma [3l but might not be completely obvious. It may be 
established rigorously as follows: 

Definition 4 For a given game G, let the game G* he the following. The game is played as G, 
except that play stops and player Max loses when the play has encountered t + 1 (not necessarily 
distinct) positions. We let val(G*)fc he the value of G^ when play starts in position k. 

(We note that va^G*) is the valuation computed after t iterations of unmodified value iteration [7].) 
A very general theorem of Mertens and Neyman [13] linking the value of an infinite game to the 
values of its time limited versions implies that limt^oo val(G*) = val(G). Also, we immediately see 
that for any k, val(G*)fc < val(G*)A: < val{G)k, so we also have limj_>.oo val(G*)A; = val{G)k- 

To relate SolveDGG of Figure [1] to modified value iteration of Figure [21 it turns out that we 
want to upper bound the smallest t for which 

yi : val(G)i - val(G*)i < 2"^''. 

Let T(G) be that t. We will bound T(G) using two different approaches. The first, in Subsection 
13. H is rather direct and is included to show what may be obtained using completely elementary 
means. It shows that T{G) < 5(ln2) • • 2*", for any game G with r coin toss positions (Lemma[7]). 

The second, in Subsection 13.21 identifies an extremal game (with respect to convergence rate) 
with a given number of positions and coin toss positions. More precisely: 

Definition 5 Let Sn,r be the set of simple stochastic games with n positions out of which r are 
coin toss positions. Let G £ Sn.r be given. We say that G is f-extremal if 

max(val(G)i — val(G*)j) = max max(val(-ff)j — val(-ff*)i). 

n.r 

We say that G is extremal if it is t-extremal for all t. 

It is clear that t-extremal games exists for any choice of n, r and t. (That extremal games 
exists for any choice of n and r is shown later in the present paper.) To find an extremal game, 
we use techniques from extremal combinatorics. By inspection of this game, we then get a better 
upper bound on the convergence rate than that offered by the first approach. We show using this 
approach that T(G) < 2(ln 25*^+1) • 2^ for any game G G Sn,r (Corollary [14]). 

Assuming that an upper bound on T(G) is available, we are now ready to finish the proof of 
correctness of the main algorithm. We will only do so explicitly for the bound on T(G) obtained 
by the second approach from Subsection 13.21 (the weaker bound implies correctness of a version of 
the algorithm performing more iterations of its main loop). From Corollary 1141 we have that for 
any game G G Sn,r, val(G^)fc and hence, modified value iteration, approximates val(G)fc within 
an additive error of 2"^'' for t > 2(ln2^'^"^^) • 2*" and k being any position. SolveSSG differs from 
Modif iedValuelteration by rounding down the values in the vector v in each iteration. Let be 
the content of the vector v immediately after executing SolveDGG in the t'th iteration of the loop 
of SolveSSG. We want to compare val(G*)fc with ijj^ for any k. As each number is rounded down 
by less than 2"'''" in each iteration of the loop and recalling Lemma [3] we see by induction that 

val(G*)fc - t2'^'' < 4 < val(G*)fc. 
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In particular, when t = 2(ln2^''"'"^) • 2^, we have that approximates val(G)fc within 2~^^ + 
2(ln2^''"'"^) • 2'"2~'^'' < 2"'^'", for any k, as we can assume r > 6 by the code of SolveSSG of FigurelH 
Lemma 2 of Condon |6j states that the value of a position in a simple stochastic game with n 
non-terminal positions can be written as a fraction with integral numerator and denominator at 
most 4". As pointed out by Chatterjee et al. [Ij, it is straightforward to see that her proof in 
fact gives an upper bound of 4^, where r is the number of coin toss positions. It is well-known 
that two distinct fractions with denominator at most m> 2 differ by at least — ?-^— rr- Therefore, 

— m(m— 1) ' 

since v]^ approximates val(G)fc within 2^^*" < ^r.i^lr_i^ from below, we in fact have that val(G)fc 
is the smallest rational number p/q so that q < and p/q > v\. Therefore, the Kwek-Mehlhorn 
algorithm applied to v\. correctly computes val(G)jt, and we are done. 

We can not use the bound on T{G) obtained by the first direct approach (in Subsection 13. Ih to 
show the correctness of SolveSSG, but we can show the correctness of the version of it that runs the 
main loop an additional factor of 0(r) times, that is, i should range over {1,2, . . . ,5(ln2) • r^2^} 
instead of over {1, 2, ... , 2{\n2^''+^) ■ T']. 

3 Bounds on the convergence rate 
3.1 A direct approach 

Lemma 6 Let G G Sn,r be given. For all positions k and all integers i > 1, we have 

val(G)fc - val(G" )fc < {1-2-')'. 

Proof If val(G)fc = 0, we also have val(G* '")fe = 0, so the inequality holds. Therefore, we can 
assume that val{G)k > 0. 

Fix some optimal positional strategy, x, for Max in G. Let y be any pure (i.e., deterministic, 
but not necessarily positional) strategy for Min with the property that y guarantees that the pebble 
will not reach GOAL after having been in a position of value (in particular, any best reply to any 
strategy of Max, including x, clearly has this property). 

The two strategies x and y together induce a probability space on the set of plays of the game, 
starting in position k. Let the probability measure on plays of G* associated with this strategy be 
denoted Pro-j,. Let Wk be the event that this random play reaches GOAL. We shall also consider 
the event Wk to be a set of plays. Note that any position occurring in any play in Wk has non-zero 
value, by definition of y. 

Claim. There is a play in Wk where each position occurs at most once. 

Proof of Claim. Assume to the contrary that for all plays in W^, some position occurs at least 
twice. Let y' be the modification of y where the second time a position, v, in V2 is entered in a 
given play, y takes the same action as was used the first time v occurred. Let W be the set of plays 
generated by x and y for which the pebble reaches GOAL. We claim that W is in fact the empty 
set. Indeed, if W contains any play q, we can obtain a play in W' where each position occurs only 
once, by removing all transitions in q occurring between repetitions of the same position. Such a 
play is also an element of Wk, contradicting the assumption that all plays in Wk has a position 
occurring twice. The emptiness of W' shows that the strategy x does not guarantee that GOAL is 
reached with positive probability, when play starts in k. This contradicts either that x is optimal 
or that val{G)k > 0. We therefore conclude that our assumption is incorrect, and that there is 
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a play q in Wk where each position occurs only once, as desired. This completes the proof of the 
claim. 

The probability according to the probability measure that a given play where each coin toss 
position occurs only once occurs, is at least 2~^. 

Let Wl be the set of plays in that contains at most i occurrences of coin toss positions 
(and also let denote the corresponding event with respect to the measure ak)- Since the above 
claim holds for any position k of non-zero value and plays in only visits positions of non- 
zero value, we see that Pro-^. < (1 — 2"'')*, for any i. Since x is optimal, we also have 
PrajWfc] > val(G)fc. Therefore, 

Pr[l^r] = PT[Wk]-PTbWnWk]Pv[Wk] 
> val(G)fc - (1 - 

The above derivation is true for any y guaranteeing that no play can enter a position of value 
and then reach GOAL, and therefore it is also true for y being the optimal strategy in the time- 
limited game, G*'^. In that case, we have Pro-^. < val(G''^)fc. We can therefore conclude that 
val(G* Ofc > val(G)fc - (1 - 2"'^)% as desired. □ 



Lemma 7 Let G G Sn,r be given. 

T{G) <5(ln2)-r2.2'^ 

Proof We will show that for any t > 5(ln2)-r^-2'' and any k, we have that val(G)fc— val(G*)fc < 2~^^. 
From Lemma [6] we have that Vz, k : val(G)fc — val(G*''')fc < (1 — 2~^y. Thus, 



va 



l{G)k - val(G*)fc < (1 - 2~''f'' = ((1 - 2-'')2'')7^ < e'^ < e- rir = 2-^\ 

□ 



3.2 An extremal combinatorics approach 

The game of Figure [3] is a game in Sn,r- We will refer to this game as En,r- En^r consists of 
no Max-positions. Each Min-position in E^^r has GOAL as a successor twice. The i'th coin toss 
position, for i > 2, has the {i — l)'st and the r'th coin toss position as successors. The first coin 
toss position has GOAL and the r'th coin toss position as successors. The game is very similar to a 
simple stochastic game used as an example by Condon [7] to show that unmodified value iteration 
converges slowly. 

In this subsection we will show that E^^r is an extremal game in the sense of Definition [5] and 
upper bound T{En^r), thereby upper bounding T{G) for all G £ Sn,r- 

The first two lemmas in this subsection concerns assumptions about t-extremal games we can 
make without loss of generality. 

Lemma 8 For all n,r,t, there is a t-extremal game in Sn,r with Vi = 0, i.e., without containing 
positions belonging to Max. 
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Figure 3: The extremal game -En,r- Circle nodes are coin toss positions, triangle nodes are Min 
positions and the node labeled GOAL is the GOAL position. 

Proof Take any t-extremal game G G Sn,r- Let x be an optimal positional strategy for Max 
in this game. Now replace each position belonging to Max with a position belonging to Min 
with both outgoing arcs making the choice specified by x. Call the resulting game H. We claim 
that H is also t-extremal. First, clearly, each position k of H has the same value as is has in 
G, i.e., val(G)fc = val(i^)fc. Also, if we compare the values of the positions of the games H*^ 
and defined in the statement of Lemma El we see that val(-ff*)fc < val(G*)fc, since the only 
difference between and G* is that player Max has more options in the latter game. Therefore, 
val{H)k — val(//*)fc > val{G)k — val(G*)A; so H must also be t-extremal. □ 

Lemma 9 For aUn,r,t, there exists a t-extremal game in Sn,r, where all positions have value one 
and where no positions belong to player Max. 

Proof By Lemma El we can pick a i-extremal game G in Sn,r where no positions belong to player 
Max. Suppose that not all positions in G have value 1. Then, it is easy to see that the set of 
positions of value is non-empty. Let this set be N. Let H be the game where all arcs into 
are redirected to GOAL. Clearly, all positions in this game have value 1. We claim that H is also 
t-extremal. 

Fix a position k. We shall show that val(i7)fc — val(i/*)fc > val(G)fc — val(G*)fc and we shall be 
done. Let be a (not necessarily positional) optimal strategy for player Min in G* for plays starting 
in k and let the probability measure on plays of G* associated with this strategy be denoted Pro-^.. 
As is also a strategy that can be played in G, we have Pro-j.[Play does not reach N] > val{G)k- 
Also, by definition, Pro-j.[Play reaches GOAL] = val(G*)fc. That is, 

Pr[Play reaches neither GOAL nor N] > val(G)fc - val(G*)fc. 

Let (jjt be an optimal strategy for plays starting in k for player Min in H*. This strategy can also be 
used in G*. Let the probability distribution on plays of G* associated with this strategy be denoted 
Pr^j.. Note that plays reaching GOAL in correspond to those plays reaching either GOAL or 



N in G*. Thus, by definition, Pr^-^play reaches neither GOAL nor A^] = 1 — val(-ff*)fe. As ak can 
be used in where ak is optimal, we have 

1 - val(-H'*)fc > val(G)fc - val(G*)fe. 

But since val{H)k = 1, this is the desired inequahty 

val(lf)fc - val{H% > val(G)fc - val(G*)fc. 

□ 

The next lemma will be used to derive a ordering of the positions in any game G satisfying the 
restrictions of Lemma El 

Lemma 10 Let G be a game without Max positions in which all positions have value one. Let 
V he a non-empty set of positions of G that does not include GOAL. Then, at least one of the 
following two cases hold: 

1. V contains a Min position with both successors outside ofV or 

2. V contains a coin toss position with at least one successor outside ofV. 

Proof Suppose not. Then the Min-player can force play to stay within V' when play starts in V' . 
Thus, the values of all positions in V' are 0, a contradiction. □ 

The following lemma will be used several times to change the structure of a game while only 
making it more extremal, eventually making the game into the specific game E^^r (in the context 
of extremal combinatorics, this is a standard technique pioneered by Moon and Moser |14|). 

Lemma 11 Given a game G. Let c be a coin toss position in G and let k be an immediate successor 
position k of c. Also, let a position k' with the following property be given: Vt : val(G*)A;' < val(G*)fc. 
Let H he the game where the arc from c to k is redirected to k' . Then, yt,j : val{H^)j < val(G*)j. 

Proof In this proof we will throughout refer to the properties of Modif iedValuelteration and 
use Lemma [3l We show by induction in t that Vj, t : val{H^)j < val(G*)j. For t = we have 
val(-ff*)j = val(G*)j by inspection of the algorithm. Now assume that the inequality holds for all 
values smaller than t and for all positions i and we will show that it holds for t and all positions j. 
Consider a fixed position j. There are three cases. 

1. The position j belongs to Max or Min. In this case, we observe that the function computed by 
SolveDGG to determine the value of position j in Modif iedValuelteration is a monotonously 
increasing function. Also, the deterministic graphical game obtained when replacing coin toss 
positions with terminals is the same for G and for H. By the induction hypothesis, we have 
that for all i, val(i?*-i)i < val(G*-i)». So, val(i7*)j < val(G*)j. 

2. The position j is a coin toss position, but not c. In this case, we have 

val(G*)j = ^val(G*-i), + Kal{G''')b, 

and 

val{H')j = ^val{H''\ + ^val{H'~^)h 

where a and b are the successors of j. By the induction hypothesis, val(i?*~^)a < val(G*~^)a 
and val(iJ*-i)fe < val(G*-i)b. Again, we have val(iJ*)j < val(G*)j. 
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GOAL 




Figure 4: An example of Hq, with n = 5 and r = 3. Circle nodes are coin toss positions, triangle 
nodes are Min positions and the node labeled GOAL is the GOAL position. Note that this particular 
Hq is not extremal. 

3. The position j is equal to c. In this case, we have 

val(G*), = lval(G*-i)a + \a\{G'-\ 
where a and k are the successors of c in G while 

By the induction hypothesis we haveval(i7*-i)a < val(G*~i)a. We also have that val(i7*-^)fc/ < 
val(G*~^)fc' which is, by assumption, at most val(G*~^)fc. So, we have val(i/*)c < val(G*)c. □ 

Theorem 12 is an extremal game in Sn^r- 

Proof Let Hq G Sn,r^ where Vi = and all positions in Hq have value one. We will show that for 
all t we have that maxfc(val(-E'„^r)fc ~ val(£'* > maxfc/(val(i?o)fc' — val(-ffg)fc'). Since by Lemma 
[9l we can take Hq to be a f-extremal game for any t, En^r is a t-extremal game for all t and is hence 
an extremal game. 

To illustrate the proof we will use as running example the game in Figure HI 

We shall construct a sequence Hi, H2, ■ ■ ■ , Hn of games (which will in fact be identical to Hq, 
except that positions have been renumbered), so that i/^ has the following property P^- 

Property P^: Any Min position j among the positions 1,2, ... ,k has all successors within the 
set {1, 2, ... ,j — 1, GOAL}. Any coin toss position j among the positions {1, 2, . . . ,k} has at least 
one successor within {1, 2, ... ,j — 1, GOAL}. 

Suppose we already constructed Hj for j < k. We show how to construct Hi^. based on H^^i. 
Applying Lemma fTOl to the game H^-i with V' = {k, . . . ,n}, we find among the positions k, . . . ,n 
either a coin toss position u with one successor in {1, 2, . . . , A; — 1, GOAL} or a Min-position u with 
all successors in {1, 2, . . . , A; — 1, GOAL}. In either case, we renumber uio k and k to u and let the 
resulting game be H^. 

Figure [5] shows Hn for the case of our running example. 

Each coin toss position in Hn has at least one successor with a lower index than itself. (Recall 
that GOAL has index 0.) In the following, we call this successor the lower successor and the other 
successor the higher successor. If both successors in fact have a lower index than the position, we 
choose one of the two successor arbitrarily as the higher successor. 
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GOAL 




Figure 5: Hn, if Hq is the game in Figure HI 




Figure 6: H', if Hq is the game in Figure HI 



We now make a series of transformations from Hn generating a new sequence of games. This 
will take us outside the set of games in Sn,r, but the last game in the sequence will again be in 
Sn,r- For each transformation, from G' to G", we will show that val(G"*)fc < val(G'*)fc for all t and 
k. For the final game En,r we arrive at, we clearly have that va\{En,r)k = 1 for all k. This is in fact 
also true for all intermediate games, but we shall not need that fact. 

For each of the original non-terminal positions 1,2, ... ,n in Hn, we add a Min-position. We 
assign index n + j to the Min-position associated with position j. We let the two successors of 
position n + j be j and n + j — 1, except for the case of n + 1, where we let the two successors 
be 1 and GOAL. Let the resulting game be denoted H'. For our running example, H' is shown in 
Figure [H 

Applying Lemma [3] and inspecting the code of Modif iedValuelteration, we have that 

Vfc G {1, 2, . . . , n}, t : yal{H'^)k = val(i?* 

We will only use that V/c G {1, 2, . . . , n},t : val(F'*)fc < val(i/* 

We also have the following fact: val(ff'*)2n = ™injG{i,2,...,n} val(ff'*)j. Indeed, we can argue 
by induction in k that val{H'^)n+k = ™iiije{i,2,...,fc} val(ff'*)j. For the base case, we have that 
val{H'*)n+i = min(l, val(i?'*)i). For j > 1, we have 

val(lf'*)„+j = min(val(iJ'*)j,val(ii"'*)„+j_i), 

completing the proof. This property, and the fact that the proof only used information about the 
successors of n + fc for k € {1, 2, . . . , n}, allows us to apply Lemma [11] iteratively to modify the 
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Figure 7: Hq, if Hq is the game of Figure [H 




Figure 8: H'^., if Hq is the game in Figured 



game by changing the higher successor of each and every coin toss position to be the position 2n. 
We denote this game Hq. 

For our running example, Hq is shown in Figure \7\ 

Let the coin toss positions in i/g be ii < ^2 < • • • < ir- Then, we define a sequence of games 
H[,H2, . . . , H!^ as follows. We define H[ from Hq by changing the lower successor of ii to GOAL. 
For j > 1, we define Hj from Hj_i by changing the lower successor of ij to be ij-i- 

For our running example, H'^ is given in Figure [51 

Claim. For t > 0,j E {1, . . . ,r + 1}, the following holds. For a position with index k strictly 
smaller than ij, we have val(-ff'*_]^)fc > val(-ff'*_^)i^_j . Here, by convention, we let be the GOAL 
position when considering the statement for j = 1 and we let v+i be oo when considering the 
statement for j = r + 1. 

Proof of claim. The proof is by induction in j. 

Clearly, val(-fr'*/)fc = 1 for all positions /c in 1, 2, . . . , ii — 1 and for all j' and t, so this settles 
the base case of j = 1. 

For larger values of j, and k < ij^i, we have by construction that val(i7'*_x)fc = val(if'*_2)A;- 
Now, k > ij-i- Therefore there are two cases. Either k is a coin toss position or A; is a Min-position. 
If /c is a coin toss position, we can without loss of generality assume that k = ij-2, since 
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it has the smallest value among all coin toss positions, by the induction hypothesis. Therefore, 
we only need to show that Vt : val(//'*_i)j^_2 > val(-ff'*_;^)j^_-^. For j = 2, we have that Vt : 
val{H'\)iQ = 1 > val(-ff'* . For j > 3, we have by the properties of the algorithm that for 
val(/?'°_i)i^_, = = val(/7';_i)i^_, and Vt > 1 : val(i7'5_i),^._, = ival(//'*ll),^_3 + ival(if'*l\),,„. 
We have by the induction hypothesis that 

^val(i7'5ll)i^_3 + ^val(i7'*ll)i2„ > ^val{H'^jl\)i^_^ + ^va\{H'^f_\)i2„ 

= val(i7'*_i)i^._,. 

If /c is a Min-position in {ij-i + 1, . . . ,ij — 1}, assume to the contrary that some k fails to 
satisfy val(-ff'*_;^)fc > val(if'*_;^)j^_j . Consider the smallest such k. As is a Min-position with 
two successors both of which are smaller than k, we have that val(//'*_^)fe is the minimum of 
two numbers, both of which are at least val{H'^j_i)i._-^, either by the induction hypothesis or the 
assumption that k is minimal. This completes the proof of the claim. 

By the claim, for all positions k £ {1,2, . . . we have that > : val{H'l.)k > val(ff'*)i^. 
Also, by an induction argument identical to the one we used to argue a similar property for H', 
we have Vt > : val(ff'*)2n = val(iif'*)i^. Thus we may define the game H" from H',. by applying 
Lemma [TT] and changing all higher successors of all coin toss positions ij to be v (instead of 2n). 

For our running example, H" is the game of Figure [9l 

Finally, we arrive at En^r by subsequently removing all "new" Min-positions n + 1, . . . ,2n and 
changing all successors of all original Min-positions of H" to be the GOAL position. 

For our running example, E^^^ is the extremal game identified by the proof, and is given in 
Figure Note that the same game would also be identified for any other game with r = 3 and 
n = 5 (up to the indexing of the positions). 

It is easy to see that all positions in En^r have value 1. We have that each Min-position k in 
H" satisfies va\{H"^)k > val(i?"*)i,,. 

We have that val(-ff* )fc > val{H"^)i^ = val{En,r^)i,. for all A; € {1, 2, ... , n}, since we found H" 
either by applying Lemma [TT] or in a way that did not change the value of any position for any 
time bound. 

Also, En^r G Sn,r, therefore, since at least one possible option for Hq £ Sn,r (and therefore Hn, 
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Figure 10: The game i?5,3, upto indexing of the positions. 

since Hn was a reindexing of the positions in Hq) was a i-extremal game for any t, En,r is extremal. 
This completes the proof of the lemma. 

□ 

Having identified the extremal game En,r, we next estimate T{En,r)- 

Lemma 13 Foryn,r > 0,e > 0,t > 2(ln 2e"i) • 2^ /c G V : val{En,r)k - y&l{E'^,r)k < e 

Proof We observe that for the purposes of estimating val(£'„^r.)fc — val(£'J^)fc, we can view En,r 
as a game containing r coin toss positions only, since all Min-positions point directly to GOAL. 
Also, when modified value iteration is applied to a game G containing only coin toss positions. 
Lemma [3] implies that val(G*)/c can be reinterpreted as the probability that the absorbing Markov 
process starting in state k is absorbed within t steps. By the structure of En^r, this is equal to the 
probability that a sequence of t fair coin tosses contains r consecutive tails. This is known to be 
exactly 1 — F^^^g/^*, where -^^^2 (f + 2)'nd Fibonacci r-step number, i.e. the number given by 

(r) v^r fr) (k) 

the linear homogeneous recurrence Fm = z2i=i ^m-i boundary conditions Fm = 0, for 

m < 0, F^^ = ^2^'' = 1. Asymptotically solving this linear recurrence, we have that Fm < {4>t)"^~^ 
where (pr is the root near 2 to the equation x + = 2. Clearly, (/>r < 2 — 2^*", so 

F^%/2' < ^ ' = 2(1 - 2-^-1)*+! < 2(1 - 2-^-^)K 

Therefore, the probability that the chain is not absorbed within t = 2(ln2e~^) • 2^ steps is at most 

2(1 - 2-^-i)2(i°2e-i)-2'- < 2e-''^2.-i ^ 

□ 

Corollary 14 For Vn, r > : T{En,r) < 2(ln25'-+i) • 2^ 

Proof The proof is by insertion into Lemma [131 □ 

4 Conclusions 

We have shown an algorithm solving simple stochastic games obtaining an improved running time 
in the worst case compared to previous algorithms, as a function of its number of coin toss positions. 
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It is relevant to observe that the best case complexity of the algorithm is strongly related to its 
worst case complexity, as the number of iterations of the main loop is fixed in advance. 

As mentioned in the introduction, our paper is partly motivated by a result of Chatterjee et 
al. [4] analysing the strategy iteration algorithm of the same authors [5] for the case of simple 
stochastic games. We can in fact improve their analysis, using the techniques of this paper. We 
combine three facts: 

1. ([5l Lemma 8]) For a game G G Sn,r-, after t iterations of the strategy iteration algorithm [5] 
applied to G, the (positional) strategy computed for Max guarantees a probability of winning 
of at least val(G*)fc against any strategy of the opponent when play starts in position k, where 
G* is the game defined from G in Definition U] of this paper. 

2. For a game G G Sn,r and all k,t, va^G*^""^"*"^^)/; > val(G*)fc. This is a direct consequence 
of the definitions of the two games, and the fact that in an optimal play, either a coin toss 
position is encountered at least after every n — r + 1 moves of the pebble, or never again. 

3. Corollary [H] of the present paper. 

These three facts together implies that the strategy iteration algorithm after 2(ln2^''+^)-2^(n— r+1) 
iterations has computed a strategy that guarantees the values of the game within an additive error 
of 2~^^', for r > 6. As observed by Chatterjee et al. [3], such a strategy is in fact optimal. Hence, 
we conclude that their strategy iteration algorithm terminates in time T'rP^^\ This improves 
their analysis of the algorithm significantly, but still yields a bound on its worst case running 
time inferior to the worst case running time of the algorithm presented here. On the other hand, 
unlike the algorithm presented in this paper, their algorithm has the desirable property that it may 
terminate faster than its worst case analysis suggests. 
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